Offload intrinsic #147936

Sa4dUs · 2025-10-21T07:50:39Z

This PR implements the minimal mechanisms required to run a small subset of arbitrary offload kernels without relying on hardcoded names or metadata.

offload(kernel, (..args)): an intrinsic that generates the necessary host-side LLVM-IR code.
rustc_offload_kernel: a builtin attribute that marks device kernels to be handled appropriately.

Example usage (pseudocode):

fn kernel(x: *mut [f64; 128]) {
    core::intrinsics::offload(kernel_1, (x,))
}

#[cfg(target_os = "linux")]
extern "C" {
    pub fn kernel_1(array_b: *mut [f64; 128]);
}

#[cfg(not(target_os = "linux"))]
#[rustc_offload_kernel]
extern "gpu-kernel" fn kernel_1(x: *mut [f64; 128]) {
    unsafe { (*x)[0] = 21.0 };
}

ZuseZ4 · 2025-10-24T00:31:36Z

compiler/rustc_middle/src/ty/offload_meta.rs

+    }
+
+    pub fn from_ty<'tcx>(tcx: TyCtxt<'tcx>, ty: Ty<'tcx>) -> Self {
+        OffloadMetadata { payload_size: get_payload_size(tcx, ty), mode: TransferKind::Both }


If you already have the code here, I would add a small check for & or byVal (implies Mode ToGPU), vs &mut (implies Both).

In the future we would hope to analyze the & or byval case more, if we never read from it (before writing) then we could use a new mode 4, which allocates directly on the gpu.

bors · 2025-11-05T11:53:48Z

☔ The latest upstream changes (presumably #148507) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2025-11-09T13:03:34Z

☔ The latest upstream changes (presumably #148721) made this pull request unmergeable. Please resolve the merge conflicts.

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

compiler/rustc_codegen_llvm/src/intrinsic.rs

compiler/rustc_middle/src/ty/offload_meta.rs

rustbot · 2025-11-16T10:27:24Z

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

bors · 2025-11-18T15:17:18Z

☔ The latest upstream changes (presumably #148151) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot · 2025-11-22T17:06:55Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

ZuseZ4 · 2025-11-25T20:35:47Z

This removes a good amount of the hacks from my first MVP, further improvements can land in a follow-up PR.
Verified to work on an MI 250X.

@bors r+

bors · 2025-11-25T20:35:50Z

📌 Commit f39ec47 has been approved by ZuseZ4

It is now in the queue for this repository.

Zalathar · 2025-11-27T02:04:40Z

Perf results from rollup:

Some regressions in the large-workspace secondary benchmark, perhaps due to the additional metadata.

Zalathar · 2025-11-27T02:20:37Z

compiler/rustc_codegen_llvm/src/back/write.rs

        // For now we only support up to 10 kernels named kernel_0 ... kernel_9, a follow-up PR is
        // introducing a proper offload intrinsic to solve this limitation.


Looks like this comment is now outdated as of this PR.

Thanks. We have a few follow-up PRs where I'll add it.

ZuseZ4 · 2025-11-27T18:39:07Z

@Zalathar We add metadata and do work if and only if you set -Zoffload=Enable flag, which isn't done in that benchmark.
if cgcx.target_is_like_gpu && config.offload.contains(&config::Offload::Enable)

We also introduce one more intrinsic, so I guess the match arm against intrinsic get's slightly larger in compiler/rustc_codegen_llvm/src/intrinsic.rs. Maybe the large-workspace has a lot of intrinsics, so the slightly larger code size for the match arm here causes a perf impact?

ZuseZ4 self-assigned this Oct 21, 2025

Sa4dUs force-pushed the offload-intrinsic branch from 9118683 to 23722aa Compare October 21, 2025 19:45

This comment has been minimized.

Sign in to view

ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Oct 22, 2025

ZuseZ4 reviewed Oct 24, 2025

View reviewed changes

ZuseZ4 mentioned this pull request Oct 24, 2025

Tracking Issue for GPU-offload #131513

Open

5 tasks

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from e0fd7be to 97a8e96 Compare November 7, 2025 15:37

This comment has been minimized.

Sign in to view

rustbot added the A-attributes Area: Attributes (`#[…]`, `#![…]`) label Nov 11, 2025

ZuseZ4 reviewed Nov 14, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs Outdated Show resolved Hide resolved

ZuseZ4 reviewed Nov 14, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/intrinsic.rs Outdated Show resolved Hide resolved

ZuseZ4 reviewed Nov 14, 2025

View reviewed changes

compiler/rustc_middle/src/ty/offload_meta.rs Outdated Show resolved Hide resolved

Sa4dUs force-pushed the offload-intrinsic branch from 3540edb to e9d89ce Compare November 14, 2025 22:33

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from e9d89ce to a08949b Compare November 15, 2025 09:49

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from a08949b to 9397d31 Compare November 15, 2025 11:24

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch from 9397d31 to 7666b58 Compare November 16, 2025 09:10

Sa4dUs marked this pull request as ready for review November 16, 2025 10:27

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 16, 2025

Sa4dUs force-pushed the offload-intrinsic branch from 1a7e216 to 0b71052 Compare November 22, 2025 17:06

This comment has been minimized.

Sign in to view

Sa4dUs force-pushed the offload-intrinsic branch 2 times, most recently from c39e4e5 to ce5970c Compare November 22, 2025 18:19

Sa4dUs added 2 commits November 25, 2025 20:04

Implement offload intrinsic

5128ce1

Update rustc-dev-guide

f39ec47

Sa4dUs force-pushed the offload-intrinsic branch from ce5970c to f39ec47 Compare November 25, 2025 19:36

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 25, 2025

jhpratt mentioned this pull request Nov 26, 2025

Rollup of 9 pull requests #149342

Closed

Zalathar mentioned this pull request Nov 26, 2025

Rollup of 12 pull requests #149351

Merged

bors merged commit 2b150f2 into rust-lang:main Nov 26, 2025
11 checks passed

rustbot added this to the 1.93.0 milestone Nov 26, 2025

Zalathar reviewed Nov 27, 2025

View reviewed changes

Mark-Simulacrum added the perf-regression Performance regression. label Dec 3, 2025

		// For now we only support up to 10 kernels named kernel_0 ... kernel_9, a follow-up PR is
		// introducing a proper offload intrinsic to solve this limitation.

Uh oh!

Offload intrinsic #147936

Offload intrinsic #147936

Uh oh!

Conversation

Sa4dUs commented Oct 21, 2025 • edited by ZuseZ4 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

ZuseZ4 Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

bors commented Nov 5, 2025

Uh oh!

This comment has been minimized.

bors commented Nov 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rustbot commented Nov 16, 2025

Uh oh!

bors commented Nov 18, 2025

Uh oh!

rustbot commented Nov 22, 2025

Uh oh!

This comment has been minimized.

ZuseZ4 commented Nov 25, 2025

Uh oh!

bors commented Nov 25, 2025

Uh oh!

Uh oh!

Zalathar commented Nov 27, 2025

Uh oh!

Zalathar Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

ZuseZ4 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

ZuseZ4 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Sa4dUs commented Oct 21, 2025 •

edited by ZuseZ4

Loading

ZuseZ4 commented Nov 27, 2025 •

edited

Loading